Detecting Sentence Boundaries in Sanskrit Texts

نویسنده

Oliver Hellwig

چکیده

The paper applies a deep recurrent neural network to the task of sentence boundary detection in Sanskrit, an important, yet underresourced ancient Indian language. The deep learning approach improves the F scores set by a metrical baseline and by a Conditional Random Field classifier by more than 10%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building a Word Segmenter for Sanskrit Overnight

There is abundance of digitised texts available in Sanskrit. However, the word segmentation task in such texts are challenging due to the issue of Sandhi. In Sandhi, words in a sentence often fuse together to form a single chunk of text, where the word delimiter vanishes and sounds at the word boundaries undergo transformations, which is also reflected in the written text. Here, we propose an a...

متن کامل

Comparing Sanskrit Texts for Critical Editions: The Sequences Move Problem

A critical edition takes into account various versions of the same text in order to show the differences between two distinct versions, in terms of words that have been missing, changed, omitted or displaced. Traditionally, Sanskrit is written without spaces between words, and the word order can be changed without altering the meaning of a sentence. This paper describes the characteristics whic...

متن کامل

Word Segmentation in Sanskrit Using Path Constrained Random Walks

In Sanskrit, the phonemes at the word boundaries undergo changes to form new phonemes through a process called as sandhi. A fused sentence can be segmented into multiple possible segmentations. We propose a word segmentation approach that predicts the most semantically valid segmentation for a given sentence. We treat the problem as a query expansion problem and use the path-constrained random ...

متن کامل

Discourse Analysis of Sanskrit texts

The last decade has seen rigorous activities in the field of Sanskrit computational linguistics pertaining to word level and sentence level analysis. In this paper we point out the need of special treatment for Sanskrit at discourse level owing to specific trends in Sanskrit in the production of its literature ranging over two millennia. We present a tagset for inter-sentential analysis followe...

متن کامل

Improving the Morphological Analysis of Classical Sanskrit

The paper describes a new tagset for the morphological disambiguation of Sanskrit, and compares the accuracy of two machine learning methods (CRF, deep recurrent neural networks) for this task, with a special focus on how to model the lexicographic information. It reports a significant improvement over previously published results. 1 Challenges of Sanskrit Linguistics and Related Research Class...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Detecting Sentence Boundaries in Sanskrit Texts

نویسنده

چکیده

منابع مشابه

Building a Word Segmenter for Sanskrit Overnight

Comparing Sanskrit Texts for Critical Editions: The Sequences Move Problem

Word Segmentation in Sanskrit Using Path Constrained Random Walks

Discourse Analysis of Sanskrit texts

Improving the Morphological Analysis of Classical Sanskrit

عنوان ژورنال:

اشتراک گذاری